Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 500 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 31.4 KiB |
| Average record size in memory | 64.3 B |
Variable types
| Numeric | 8 |
|---|
station has unique values | Unique |
lat has unique values | Unique |
lng has unique values | Unique |
geoid has unique values | Unique |
Reproduction
| Analysis started | 2021-05-03 01:05:08.075309 |
|---|---|
| Analysis finished | 2021-05-03 01:12:51.905922 |
| Duration | 7 minutes and 43.83 seconds |
| Software version | pandas-profiling v2.12.0 |
| Download configuration | config.yaml |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1181.836 |
| Minimum | 1 |
|---|---|
| Maximum | 1569 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 130.9 |
| Q1 | 1102.5 |
| median | 1261.5 |
| Q3 | 1423.25 |
| 95-th percentile | 1540.05 |
| Maximum | 1569 |
| Range | 1568 |
| Interquartile range (IQR) | 320.75 |
Descriptive statistics
| Standard deviation | 374.8967167 |
|---|---|
| Coefficient of variation (CV) | 0.3172155161 |
| Kurtosis | 3.31985844 |
| Mean | 1181.836 |
| Median Absolute Deviation (MAD) | 161 |
| Skewness | -1.973467698 |
| Sum | 590918 |
| Variance | 140547.5482 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1024 | 1 | 0.2% |
| 1426 | 1 | 0.2% |
| 1441 | 1 | 0.2% |
| 1439 | 1 | 0.2% |
| 1438 | 1 | 0.2% |
| 1437 | 1 | 0.2% |
| 1436 | 1 | 0.2% |
| 1435 | 1 | 0.2% |
| 1433 | 1 | 0.2% |
| 1432 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 |
| Value | Count | Frequency (%) |
| 1569 | 1 | |
| 1568 | 1 | |
| 1567 | 1 | |
| 1566 | 1 | |
| 1565 | 1 |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.31119048 |
| Minimum | 30.03024035 |
|---|---|
| Maximum | 39.98665516 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 30.03024035 |
|---|---|
| 5-th percentile | 30.74124546 |
| Q1 | 33.01165918 |
| median | 35.49866644 |
| Q3 | 37.73823063 |
| 95-th percentile | 39.56235719 |
| Maximum | 39.98665516 |
| Range | 9.95641481 |
| Interquartile range (IQR) | 4.726571443 |
Descriptive statistics
| Standard deviation | 2.811239498 |
|---|---|
| Coefficient of variation (CV) | 0.07961327443 |
| Kurtosis | -1.134837947 |
| Mean | 35.31119048 |
| Median Absolute Deviation (MAD) | 2.379393085 |
| Skewness | -0.09642132679 |
| Sum | 17655.59524 |
| Variance | 7.903067515 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 39.38927718 | 1 | 0.2% |
| 31.35765629 | 1 | 0.2% |
| 37.50112144 | 1 | 0.2% |
| 30.97805448 | 1 | 0.2% |
| 39.54810166 | 1 | 0.2% |
| 37.00519209 | 1 | 0.2% |
| 38.83404573 | 1 | 0.2% |
| 35.22346631 | 1 | 0.2% |
| 36.83168786 | 1 | 0.2% |
| 30.7465718 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| 30.03024035 | 1 | |
| 30.05365326 | 1 | |
| 30.09604681 | 1 | |
| 30.10116885 | 1 | |
| 30.13536678 | 1 |
| Value | Count | Frequency (%) |
| 39.98665516 | 1 | |
| 39.98260317 | 1 | |
| 39.97641947 | 1 | |
| 39.97043089 | 1 | |
| 39.92827387 | 1 |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -88.40994379 |
| Minimum | -99.90351721 |
|---|---|
| Maximum | -74.69559264 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 500 |
| Negative (%) | 100.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | -99.90351721 |
|---|---|
| 5-th percentile | -98.77882417 |
| Q1 | -94.23691872 |
| median | -88.48051502 |
| Q3 | -82.84815593 |
| 95-th percentile | -76.80850422 |
| Maximum | -74.69559264 |
| Range | 25.20792457 |
| Interquartile range (IQR) | 11.38876279 |
Descriptive statistics
| Standard deviation | 6.917512811 |
|---|---|
| Coefficient of variation (CV) | -0.07824360603 |
| Kurtosis | -1.093432973 |
| Mean | -88.40994379 |
| Median Absolute Deviation (MAD) | 5.724065135 |
| Skewness | 0.1116124007 |
| Sum | -44204.97189 |
| Variance | 47.85198349 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -93.050766 | 1 | 0.2% |
| -88.0795686 | 1 | 0.2% |
| -96.45525166 | 1 | 0.2% |
| -83.24824816 | 1 | 0.2% |
| -99.58513782 | 1 | 0.2% |
| -88.82630809 | 1 | 0.2% |
| -76.36149476 | 1 | 0.2% |
| -93.66123401 | 1 | 0.2% |
| -90.57076249 | 1 | 0.2% |
| -96.8445264 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| -99.90351721 | 1 | |
| -99.82495039 | 1 | |
| -99.800411 | 1 | |
| -99.75537728 | 1 | |
| -99.74653318 | 1 |
| Value | Count | Frequency (%) |
| -74.69559264 | 1 | |
| -74.69617733 | 1 | |
| -75.17721728 | 1 | |
| -75.32422376 | 1 | |
| -75.43986787 | 1 |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.94852942 × 1014 |
| Minimum | 1.0030101 × 1013 |
|---|---|
| Maximum | 5.40919648 × 1014 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 1.0030101 × 1013 |
|---|---|
| 5-th percentile | 1.107050395 × 1013 |
| Q1 | 2.00363028 × 1014 |
| median | 2.90398702 × 1014 |
| Q3 | 4.501921773 × 1014 |
| 95-th percentile | 5.10740436 × 1014 |
| Maximum | 5.40919648 × 1014 |
| Range | 5.30889547 × 1014 |
| Interquartile range (IQR) | 2.498291493 × 1014 |
Descriptive statistics
| Standard deviation | 1.545291824 × 1014 |
|---|---|
| Coefficient of variation (CV) | 0.5240889961 |
| Kurtosis | -1.066633698 |
| Mean | 2.94852942 × 1014 |
| Median Absolute Deviation (MAD) | 1.18469136 × 1014 |
| Skewness | -0.1899108044 |
| Sum | 1.47426471 × 1017 |
| Variance | 2.387926821 × 1028 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.70299501 × 1014 | 1 | 0.2% |
| 2.80639501 × 1014 | 1 | 0.2% |
| 2.90630801 × 1014 | 1 | 0.2% |
| 2.01110008 × 1014 | 1 | 0.2% |
| 2.80559501 × 1014 | 1 | 0.2% |
| 5.4087963 × 1014 | 1 | 0.2% |
| 4.71339506 × 1014 | 1 | 0.2% |
| 4.50850018 × 1014 | 1 | 0.2% |
| 3.70499603 × 1014 | 1 | 0.2% |
| 3.71419203 × 1014 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| 1.0030101 × 1013 | 1 | |
| 1.0030104 × 1013 | 1 | |
| 1.0059502 × 1013 | 1 | |
| 1.0059505 × 1013 | 1 | |
| 1.007010003 × 1013 | 1 |
| Value | Count | Frequency (%) |
| 5.40919648 × 1014 | 1 | |
| 5.4087963 × 1014 | 1 | |
| 5.40859625 × 1014 | 1 | |
| 5.40839659 × 1014 | 1 | |
| 5.40759603 × 1014 | 1 |
state
Real number (ℝ≥0)
| Distinct | 23 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.378 |
| Minimum | 1 |
|---|---|
| Maximum | 54 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 20 |
| median | 29 |
| Q3 | 45 |
| 95-th percentile | 51 |
| Maximum | 54 |
| Range | 53 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 15.43618046 |
|---|---|
| Coefficient of variation (CV) | 0.5254333333 |
| Kurtosis | -1.067503709 |
| Mean | 29.378 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | -0.1911321125 |
| Sum | 14689 |
| Variance | 238.2756673 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 48 | 58 | 11.6% |
| 29 | 41 | 8.2% |
| 37 | 37 | 7.4% |
| 28 | 35 | 7.0% |
| 20 | 32 | 6.4% |
| 40 | 32 | 6.4% |
| 13 | 32 | 6.4% |
| 1 | 31 | 6.2% |
| 5 | 27 | 5.4% |
| 21 | 26 | 5.2% |
| Other values (13) | 149 |
| Value | Count | Frequency (%) |
| 1 | 31 | |
| 5 | 27 | |
| 10 | 2 | 0.4% |
| 12 | 11 | 2.2% |
| 13 | 32 |
| Value | Count | Frequency (%) |
| 54 | 15 | 3.0% |
| 51 | 19 | 3.8% |
| 48 | 58 | |
| 47 | 23 | 4.6% |
| 45 | 16 | 3.2% |
county
Real number (ℝ≥0)
| Distinct | 135 |
|---|---|
| Distinct (%) | 27.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 106.702 |
| Minimum | 1 |
|---|---|
| Maximum | 810 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 41 |
| median | 89 |
| Q3 | 141 |
| 95-th percentile | 301.1 |
| Maximum | 810 |
| Range | 809 |
| Interquartile range (IQR) | 100 |
Descriptive statistics
| Standard deviation | 92.91826949 |
|---|---|
| Coefficient of variation (CV) | 0.8708203173 |
| Kurtosis | 9.272092362 |
| Mean | 106.702 |
| Median Absolute Deviation (MAD) | 50 |
| Skewness | 2.291777857 |
| Sum | 53351 |
| Variance | 8633.804806 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 77 | 13 | 2.6% |
| 1 | 11 | 2.2% |
| 3 | 9 | 1.8% |
| 133 | 9 | 1.8% |
| 63 | 9 | 1.8% |
| 5 | 9 | 1.8% |
| 41 | 8 | 1.6% |
| 47 | 8 | 1.6% |
| 21 | 8 | 1.6% |
| 119 | 8 | 1.6% |
| Other values (125) | 408 |
| Value | Count | Frequency (%) |
| 1 | 11 | |
| 3 | 9 | |
| 5 | 9 | |
| 7 | 3 | 0.6% |
| 9 | 4 | 0.8% |
| Value | Count | Frequency (%) |
| 810 | 1 | |
| 503 | 2 | |
| 487 | 1 | |
| 471 | 1 | |
| 457 | 1 |
tract
Real number (ℝ≥0)
| Distinct | 299 |
|---|---|
| Distinct (%) | 59.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 592200.96 |
| Minimum | 100 |
|---|---|
| Maximum | 990200 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 100 |
|---|---|
| 5-th percentile | 1800.95 |
| Q1 | 50478 |
| median | 935654.5 |
| Q3 | 955100 |
| 95-th percentile | 970800.05 |
| Maximum | 990200 |
| Range | 990100 |
| Interquartile range (IQR) | 904622 |
Descriptive statistics
| Standard deviation | 429690.8466 |
|---|---|
| Coefficient of variation (CV) | 0.7255828268 |
| Kurtosis | -1.707153512 |
| Mean | 592200.96 |
| Median Absolute Deviation (MAD) | 37295.5 |
| Skewness | -0.4408795899 |
| Sum | 296100480 |
| Variance | 1.846342236 × 1011 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 950100 | 25 | 5.0% |
| 950200 | 20 | 4.0% |
| 950400 | 15 | 3.0% |
| 950300 | 13 | 2.6% |
| 960100 | 11 | 2.2% |
| 960300 | 8 | 1.6% |
| 950500 | 8 | 1.6% |
| 960200 | 6 | 1.2% |
| 970100 | 6 | 1.2% |
| 970500 | 5 | 1.0% |
| Other values (289) | 383 |
| Value | Count | Frequency (%) |
| 100 | 4 | |
| 200 | 2 | |
| 300 | 3 | |
| 400 | 2 | |
| 700 | 2 |
| Value | Count | Frequency (%) |
| 990200 | 2 | |
| 990100 | 1 | 0.2% |
| 990000 | 3 | |
| 980100 | 2 | |
| 979900 | 1 | 0.2% |
block
Real number (ℝ≥0)
| Distinct | 343 |
|---|---|
| Distinct (%) | 68.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2113.538 |
| Minimum | 1 |
|---|---|
| Maximum | 6069 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1004 |
| Q1 | 1063 |
| median | 2028 |
| Q3 | 3015 |
| 95-th percentile | 4107.6 |
| Maximum | 6069 |
| Range | 6068 |
| Interquartile range (IQR) | 1952 |
Descriptive statistics
| Standard deviation | 1169.818872 |
|---|---|
| Coefficient of variation (CV) | 0.5534884501 |
| Kurtosis | 0.6027893061 |
| Mean | 2113.538 |
| Median Absolute Deviation (MAD) | 973.5 |
| Skewness | 0.9774052511 |
| Sum | 1056769 |
| Variance | 1368476.193 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3000 | 6 | 1.2% |
| 1005 | 6 | 1.2% |
| 1003 | 5 | 1.0% |
| 1000 | 5 | 1.0% |
| 1048 | 5 | 1.0% |
| 2090 | 4 | 0.8% |
| 1009 | 4 | 0.8% |
| 1004 | 4 | 0.8% |
| 2002 | 4 | 0.8% |
| 1006 | 4 | 0.8% |
| Other values (333) | 453 |
| Value | Count | Frequency (%) |
| 1 | 2 | |
| 2 | 1 | |
| 6 | 1 | |
| 14 | 1 | |
| 19 | 1 |
| Value | Count | Frequency (%) |
| 6069 | 1 | |
| 6050 | 1 | |
| 6036 | 1 | |
| 6030 | 1 | |
| 5496 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| station | lat | lng | geoid | state | county | tract | block | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 33.626177 | -98.066593 | 480770303021036 | 48 | 77 | 30302 | 1036 |
| 1 | 10 | 37.786761 | -98.092724 | 201550018001114 | 20 | 155 | 1800 | 1114 |
| 2 | 1000 | 36.667908 | -77.008185 | 511752004002068 | 51 | 175 | 200400 | 2068 |
| 3 | 1001 | 33.162642 | -98.698637 | 485039504002141 | 48 | 503 | 950400 | 2141 |
| 4 | 1002 | 31.811688 | -95.105910 | 480739508013036 | 48 | 73 | 950801 | 3036 |
| 5 | 1003 | 33.468272 | -83.790157 | 132171002011043 | 13 | 217 | 100201 | 1043 |
| 6 | 1004 | 36.810252 | -77.073923 | 511752001002052 | 51 | 175 | 200100 | 2052 |
| 7 | 1006 | 30.926817 | -85.552415 | 120599601001072 | 12 | 59 | 960100 | 1072 |
| 8 | 1007 | 34.613684 | -82.802887 | 450070107006069 | 45 | 7 | 10700 | 6069 |
| 9 | 1008 | 33.597874 | -96.106257 | 481479505002059 | 48 | 147 | 950500 | 2059 |
Last rows
| station | lat | lng | geoid | state | county | tract | block | |
|---|---|---|---|---|---|---|---|---|
| 490 | 1558 | 38.215510 | -88.726483 | 170810504002075 | 17 | 81 | 50400 | 2075 |
| 491 | 1560 | 33.908145 | -94.230955 | 51330804002025 | 5 | 133 | 80400 | 2025 |
| 492 | 1561 | 35.726361 | -99.534239 | 401299600002117 | 40 | 129 | 960000 | 2117 |
| 493 | 1562 | 36.931543 | -94.454529 | 291450206021036 | 29 | 145 | 20602 | 1036 |
| 494 | 1564 | 36.978617 | -94.204601 | 291450204005055 | 29 | 145 | 20400 | 5055 |
| 495 | 1565 | 37.236228 | -82.537498 | 211959309002009 | 21 | 195 | 930900 | 2009 |
| 496 | 1566 | 37.353235 | -93.458478 | 290770050022044 | 29 | 77 | 5002 | 2044 |
| 497 | 1567 | 35.444186 | -95.975046 | 401110009024139 | 40 | 111 | 902 | 4139 |
| 498 | 1568 | 35.534000 | -97.828454 | 400173008014082 | 40 | 17 | 300801 | 4082 |
| 499 | 1569 | 39.279579 | -80.085089 | 540919648005049 | 54 | 91 | 964800 | 5049 |